--- layout: post title: Measurement-Efficient Bayesian Inference excerpt:
Some notes and video lecture on using generative models to perform Bayesian inference with minimal measurements.
---You can find a video presentation of these notes here: https://www.youtube.com/watch?v=QhXiyQYkc80
Problems of the form: \[ y=f(x,n) \]
\(y\) is an observed measurement \(f\) is a measurement model. \(x\) is the target signal \(n\) is measurement noise
We would like to know \(x\), but \(f\) is not invertible, hence the name ‘inverse problem’.
We like linear inverse problems: \[ y = Hx + n \] Where \(H \in \mathbb{R}^{N\times M}\) is not invertible because it has low rank, so we can’t simply compute \(x \approx H^{-1}y\).
Further, because of its low rank, there are typically multiple values of \(x\) that solve \(y = Hx\), i.e. \(H\) is many-to-one.
For example:
measured_fish = (mask * fish) + noiseIn principle, any value for \(x\) in the masked region solves \(y=Hx +n\). But we want our algorithm to inpaint this region with “plausible” solutions. Bayes’ Theorem lets us calculate exactly that: \[ p(x \mid y) = \frac{p(y \mid x)p(x)}{p(y)} \] \[ \text{with, e.g.:} \quad p(y \mid x) = \mathcal{N}(y; Hx, \sigma_y^2) \] Now sampling \(x \sim p(x \mid y)\) should give plausible solutions – in particular those that have high probability under both the likelihood and the prior.
We can imagine this in terms of ‘slicing’ from the joint space only the region where \(Y=y\), and seeing which values for \(x\) co-occur with that.
Our prior then serves to constrain the image space, making our problem more tractable (less many-to-one).
Intuition: MNIST Where would you sample? Why?
Case Study: Accelerated MRI Forward model for MRI: \[ y = U\mathcal{F}(x) + n \] \(U\) is a subsampling mask \(\mathcal{F}\) is the Fourier transform
Imagine we only have 10 minutes to make our scan – which lines should we choose to get the most information about the ‘true’ image?
Another way of asking this question, is which measurements \(y\) would result in us being least uncertain about \(x\)?
Fortunately, we can measure this uncertainty using information entropy. In particular, we care about the posterior entropy, \(H(x \mid y)\). Our goal then becomes:
\[ U^* = \underset{U}{\text{arg min }} H(x \mid y) \quad \text{s.t. } U \text{ contains } K \text{ scan lines.} \]
Rather than trying to optimize all lines at once, we can do this iteratively, so that we can use the information we gather along the way to make better decisions. Note that the order doesn’t affect information gain since the Bayesian updates are commutative.
Along with some other tricks (see paper), this gives a new objective:
\[ \text{scan line}_t^* = \underset{\text{scan line}_t}{\text{arg max }} [H(y_t \mid U_t, y_{0:t-1})] \]
The measurements that we expect to be high-entropy are the ones we know least about, and therefore the ones that will give us the most new information.
How can we compute this entropy term? The option we chose:
Noising; forward diffusion process